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Abstract 



C/2 ' We present in this paper a generic implementation of the Pruned Dynamic Programing 

Algorithm. We discuss the performance of this algorithm compared to that of several 
CN ! algorithms (PELT, CART) - also programed in CH — h to allow a fair comparison. The 

program was written in a full template way, thus allowing a large range of applications and 
\Q ' a convenient way of adding extensions. 
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g : Introduction 



We consider a change-point detection framework where we want to obtain the K max best 
partitions in 1 to K max regions of a signal of size n with respect to some loss or cost 
function. This problem arises when studying series of data, such as climate data (when 
did the Maunder minimum, the little ice age or the Dalton minimum begin?) or financial 
data series. This problem is also frequently encountered in CGH analysis, audio processing 
and Seq-data analysis. For each 1 < k < K max the signal is divided into k contiguous 
and homogeneous segments delimited by k — 1 splitpoints denoted as n, . . . ,Tk-i- For 
convenience, we note To = and Tk = n. For 1 < I < k, a segment r\ is the set of points 

n = {te N|r,_i < t < n } = [r,_i; n{ (1) 

Difficulties arise both because of the large number of possible partitions ((^Zi) f° r a signal 
of length n divided into k segments) and because many possible optimal solutions may 
exist. 

In the following section we briefly present some algorithms related to our problem, 
their issues, and why we felt the need for our implementation of the Pruned Dynamic 
Programming Algorithm. In Section 3 we briefly present the PDPA algorithm, its com- 
plexity and the basic operations it relies on as well as its implementation in C++ and a 
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large range of applications and possible extensions. Finally, In Section 4 we present com- 
parisons of splitpoint positions and runtime between our algorithm and both the DPA 
and the CART heuristic. 



2 Related Work 

Several algorithms exist to address segmentation problems. They can be divided into 
either exact methods, with drawbacks on complexity and capacity, efficient heuristic and 
constrained-segmentation methods. 

The first class essentially consists in improvements of the Dynamic Programing Algo- 
rithm (DPA) introduced by ([1]). It finds an optimal solution with respect to the quadratic 
loss in Q(K max n 2 ) time and Q(n 2 ) space. Improvements include a version that is linear in 
space ([4]), and the Optimal Partitioning method that directly retrieves an optimal seg- 
mentation in an unknown K segments and whose complexity is 0(n 2 ). Nevertheless two 
main issues remain: the quadratic complexity in n restricts the use of this algorithm to 
signals of small length n (reasonable up to n = 10 4 , too long as soon as n > 10 5 ) and the 
lack of versatility (only the Gaussian loss is implemented) makes it an obstacle to its use 
for most data-sets (for instance any count data). Recently, the PELT algorithm (Pruned 
Exact Linear Time [7]) was introduced, that finds the exact optimal segmentation in an 
unknown number k of segments which is estimated within the algorithm. Even though 
under certain conditions this algorithm is linear in time (0(n)), PELT only returns the 
best solution for an unchoosable k. 

The second class of algorithms includes a wide range of methods. One of them is the 
CART algorithm (or Binary Segmentation, [2], [3]), a heuristic procedure to approach 
the best segmentations in 1 to K max segments. The idea is to split a segment into two 
segments at each step by minimizing a criterion, and to keep the best partition for the 
next step. The algorithm can either be stopped when a fixed number of segments is 
reached, or when the cost gets lower than a fixed threshold. Its complexity is bounded 
by uo(n) < C < 0(K max n) in time and by O(k) in space (signal data excluded). This 
method, though faster than the DPA, is not guaranteed to retrieve an optimal solution. 
Other non-optimal methods include optimization of a constrained criterion. One example 
is Harchaoui's procedure ([5]) that consists in optimizing the least-square criterion subject 
to the Lasso constraint \ fi t — fJ>t-i\ < s, where s is a threshold to set. 

In the next section we describe the Pruned Dynamic Programing Algorithm (PDPA) 
proposed in [9] and it's C++ implementation. It represents a considerable improvement 
to the existing methods as it is exact, faster than the DPA (its time complexity is bounded 
by bj[K max n) < C < 0(K max n 2 ) and is empirically in 0(Knlog(n))), thrifty (its space 
complexity is bounded by Q(Kn)) , and highly versatile. 
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3 The algorithm 



3.1 Framework: 

Let us assume that we have a sequence of length n: {Yt\te\i.n\- We define Aik,t the set 
of all possible partitions in k > regions of the sequence up to point t. The number 
of possible partitions, card{Aik,t), is QZi)- The goal of the pruned DPA is to obtain 
for all k < K max the contiguous partition m in M. k ,t of minimal loss (Lavielle 2005): 
g m c ( r )' with c(r) the cost of region r of m defined as: 

i 6 r 

c(r) = mm {M 6 K}{c(r,/^)}, 

where 7 is a loss function and g is a regularization penalty. 

We define M k >t as the optimal contiguous partition in k regions up to t and C k> t as its 
optimal cost: 



M htt = argmin {m 6 Mk t} 
C h ,t = min {m g Mfc)t j <^ 



c(r) > and 
r € m ) 



c[r) 

r £ m 



As Cfc t is region- additive, the following update equation holds: 

V t > k > 2 C k>t = min { C fc _ 1)T + c([r + 1, t[) } (2) 

fc— l<T<t 

Using update equation 2, the original DPA performs t comparisons at each step and 
retrieves all {Ck, n }ke[i,K max ] in exactly Q{K max n 2 ) runtime. 



3.2 Overview of the PDPA 

The PDPA studies the function Hk,t(/j) which is the cost of the best contiguous partition 
in k regions with a last region parameter fi: 

HkM = , C k-i,r + c([r + 1, *],//) }, 

fc— l<r<< 

and from there gets C k) t as min^{Hk^t{^)}- More precisely, for each total number of 
regions, k, from 2 to i^, the pruned DPA works on a list of last splitpoint candidates: 
ListCandidatefc. For each of these candidate splitpoint, r, the algorithm stores a cost 
function and a set of optimal-cost intervals. To be more specific, we define: 

• H kt (/j,) = C k)T + 5_)!- =r +i l^Xji /-0 + fl'(/ i ) : the optimal cost if the last splitpoint is r; 

• Sj tt = {// I Hl t (n) < H kt t{^) }'■ the set of \x such that r is optimal; 
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• Ikt = I H-knif 1 ) — Hlnif 1 ) }'■ the set of \i such that r is better than t, with 
t < t. 

We obviously have H kt ({i) = min T < t {iJ£ t (/i)}. 

The PDPA rely on four basic properties of these quantities: 

1. if all X^ir+i r 1^Xh / i ) are unimodal in /i then IJ. t are intervals; 

2. it is straightforward to get H^ t+1 (fj,) from H^ t (fi) using: 

Hk, t+ M= #£»+7(*W); 

3. it is easy to update S^i+i using: 

-Sfe,t = R \ ( u re[fc-i,t-i]-^fc,t); 

4. once it has been determined that Sj, t is empty, r can be discarded from the list of 
last splitpoint candidates: 

S T k>t = =>> V t' > t 5^, = 0. 

3.3 Complexity of the algorithm 

If all Y^jtX+i 7 0*3') Z 1 ) are unimodal in /i then the worst case complexity of the algorithm 
can be bounded in Q(Kn 2 ) time and Q(Kn) space (see [9]). In other words the algorithm 
is at worst equivalent to the original DPA and its improvement by Guedon. This worst 
case scenario occurs when one has to segment linear data (namely Vi, Yi = ai for some 
a G M). In that case all positions are kept as possible splitpoints at each step of the 
algorithm and 0(1) comparisons need to be performed at step I, which gives a quadratic 
time complexity. 

However, for many profiles the algorithm is empirically faster. For a constant profile 
the time complexity is in Q(Kn). For such a signal the number of splitpoint candidates 
at each step is only 1. Empirically, at least for CGH profiles, the number of candidates 
stored at each step of the algorithm seems to be in log(/) rather than / and the time 
complexity is in 0(Kn\og(n)). 

On a regular computer, the runtime to process a million-point profile and retrieve 
the best segmentations in 1 to 50 regions is a few minutes. The runtime of the original 
algorithm is several days. 

3.4 General architecture of the package 

The main work presented in this paper is the engineering of third author's algorithm. The 
difficulties we had to come through were the versatility of the program to be written and 
the design of the objects it could work on. Indeed, the use of full templates implied that 
we used stable sets of objects for the operations that were to be performed on. 
Namely: 
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• The sets were to be chosen in a tribe. This means that they all belong to a set S 
of sets such that any set s 6 5 can be conveniently handled and stored into the 
computer and such that 

1. if s belongs to s, R \ s G S 

2. if Si, s 2 G S, Si D s 2 G S 

3. if si, s 2 6 5, si U s 2 G S 

(We call such a set of sets S an acceptable set of sets.) 

• The cost functions were chosen in a set J 7 such that 

1. each function may be conveniently handled and stored by the software 

2. for any / G J 7 , the software can easily solve f(x) = and the set of solutions 
belongs to an acceptable set of sets 

3. for any / G J 7 and any constant c, the software can easily solve f(x) < c and 
the set of solutions belongs to an acceptable set of sets 

4. for any /, g G J 7 , f + g G J 7 . 

This is why we chose the sets of sets to be chosen in collections of intervals or paral- 
lelepipeds and implemented the loss functions corresponding to negative binomials, Pois- 
son or normal laws. One can also incorporate a penalty function g(fi). This function 
should not depend on the segment characteristic. Typically one can choose g(/j) = A|/i| 
with A > 0. 

Of course the program is designed in a way that any user could add his own cost 
function or acceptable set and use it without rewriting a line in the code. 

3.5 Link with the algorithm 

For each possible number k of regions, from 2 to K max , the pruned DPA proceeds schemat- 
ically as follows. First, the list of candidates is initialized as {k — 1} because the first 
possible last splitpoint is — 1. Then for every new data point t > k the pruned DPA 
proceeds in five steps: 

1. all candidate cost functions f (/f)) are compared to the new candidate t to retrieve 
the interval l T k t = {/i | C fc _i iT + Y?j=r+i l( Y j' A*) - C k-\,t } (property 1); 

2. all candidate cost functions (if£ t (/i)) are updated using property 2; 

3. all candidate sets of intervals (S kt ) are updated using the computed and property 
3; 

4. C k>t is computed as min ListCandidate)s {min At {^ t (//)}} 

5. all candidates with an empty Sj :t are discarded using property 4. 
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To implement these five steps in a versatile fashion we define two template classes, 
one for the cost function and one for the set of intervals. The template class for a set of 
intervals has a method to get the complement of a union of intervals and a method to 
compute its self-intersection with an interval. The template class for the cost function 
has a + operation (to update the cost function), a method to find its minimum and a 
method to compute the set of values for which it is smaller than a given constant (i£ f ). 

The core of the algorithm is written using these template classes. In this way only 
a cost function class needs to be specified. As an example and for testing purposes we 
specified the quadratic, Poisson and negative binomial cost functions. In the quadratic 
function class the obtention of the interval for which it is minimal is internally computed 
using analytical formulae whereas in the case of the Poisson and negative binomial this is 
done using Newton-Raphson's method. 

4 Results and comparisons 
4.1 Comparisons of performance 

In order to illustrate the validity of our code we have performed a number of simulations for 
which we have compared the segmentations and costs given by the Original DPA (ODPA) 
and the Pruned DPA (PDPA). We implemented Guedon's version of the algorithm in 
C++ to allow for fair comparisons with many losses (available versions are usually in R 
or matlab and only for the quadratic loss). A number of possibilities regarding the choice 
of the size of the signals (varying between 100 and 10 000 points) and number of segments 
(range of 10 to 150) have been tested for the quadratic, Poisson and negative binomial 
losses. For each combination, 1000 simulations have been tested, each time assuming the 
number of segments is given. (For more details regarding the simulation procedure see 
the Appendix). 

In all simulations, our algorithm retrieved the same cost as that given by the ODPA 
with a tolerated error of up to 10 -10 to allow for differences of computer approximations in 
the computations. The percentage of identical segmentations (i. e. the number of times 
both algorithms chose the same splitpoints) for the different losses varies between 99 and 
100%. This is due to the possible existence of different equivalent segmentations that are 
optimal with respect to the log-likelihood criteria. In the particular case of discrete data, 
this configuration is frequent. We show an example where there are three different and 
non trivial optimal segmentations in two segments for the quadratic loss (see Figure 1). 

We then compared our result to the CART algorithm. Because no C++ version was 
available, we implemented it for the quadratic loss and only performed the comparisons 
for the latter. 

CART runs the following way: one has an interval, say /, to be split into k pieces but 
the splits are computed and decided one by one. At the first step one choses the best way 
to split / in two parts, I\ and J| such that I\ U l\ = I. 

At the second step one has to compute the best splitting of l\ and l\ and keep the 
best of them. Thus one splits one of the two intervals l\ and l\ and obtains If, if and 
7| and so on. . . 
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Figure 1: Three equivalent optimal segmentations with the quadratic loss 
The black triangles represent the signal, the colored lines are equivalent optimal segmentations. 

Done naively, this leads to a quadratic (in k) number of comparisons. Instead we used 
a heap that allows for the complexity of CART to be in O(klogk) instead of the 0(k 2 ) 
usually observed. 

Once again simulations were run for a large combination of signal sizes and segment 
numbers. Figure 4.1 shows the percentage of correct splitpoints found by CART (black 
circles) and the relative error between the best cost (obtained by PDPA) and the cost of 
the segmentation chosen by CART (red triangles) when the size of the signal increases (a.) 
and when the number of segments increases (b.) We can see that CART performs well 
when the signal is large and the number of segments is small, but fails to retrieve optimal 
segmentations when the ratio of the size of the signal over the number of segments gets 
small. In a total of 2200 simulations, CART retrieved the correct segmentation 43 times. 

Let us finally emphasize again the fact that our software works with many possible 
loss functions or cost functions and sets (see subsection 3.4.) 

4.2 Comparison of runtimes 

In this section we compared the speed performance of the three algorithms. Because the 
original dynamic programing algorithm is known to be quadratic in time, the comparison 
with it was done only for signals up to 100, 000 points. For larger signals experiments 
were done individually. The reader may refer to Figure 3 for a summary of results. 

Then we compared the performance of PDPA and CART regarding speed. While the 
latter is faster, it is important to note that the shape of the curves is identical, as shown 
in Figure 4. 

While implementing CART, one is confronted with the following problem. At step I 
one has / intervals partitioning the initial interval and has the score of each of them. The 
interval obtaining the best score needs to be split in two, its two parts need to have their 
score computed and to be added to the pool of intervals. 

The best way to handle this problem is to use a heap and this is what the authors did 
for the comparisons below. 

Let us now compute an approximation of the number of operations needed for both 
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Fixed number of segments (15) Fixed size of signals (10000) 




te-f03 5e-HQ3 1et€4 5e+-04 1e+05 5e+C5 1e+06 20 4D 60 50 100 12-0 140 



Figure 2: Comparison of the performance of CART and PDPA 

On the left figure the X axis represents the size of the signal on the log-scale, on the right figure the 
number of segments. For both figures, the left Y axis represents the percentage of right splitpoints 
retrieved by CART, the right Y axis the relative error of CART when compared to the cost of the 
optimal segmentation (given by PDPA). 
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Figure 3: Comparison of the runtimes of ODPA and PDPA 



Both figures compare the speed of OPDA and PDPA for signals with varying sizes (given on the X axis) 
but a fixed number of segments (f5). The results are given in seconds (Y axis). 
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Fixed number of segments (15) Fixed size of signals (10000) 
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Figure 4: Comparison of the runtimes of CART and PDPA 

On the left figure the X axis represents the size of the signal on the log-scale, on the right figure the 
number of segments. For both figures, the left Y axis represents the speed of the PDPA while the right 
Y axis represents the speed of CART, both in seconds. Even though CART is a lot faster, the shape of 
the curves are similar. 



algorithms. At each new step, in order to find the best division of a segment of length 
/ into 2 pieces, CART requires 71 — 3 elementary operations. Thus the worst number of 
elementary operations is bounded by K max (7n — 3) + ^2^ max log 2 (fc). (The term \og 2 (k) 
comes from the use of the heap.) 

If the splits are quite regularly distributed, one obtains (7n— 3) ^og 2 (K max )+Y2^ max \og 2 (k). 
Therefore one may expect a n\og 2 (K max ) time complexity. 

Now PDPA needs at each step to find the roots and minimum of cost functions for 
each candidate, for which it requires 13 elementary operations. Intersecting two intervals 
requires 2 elementary operations. If the pruning is perfect (one candidate left at each step) , 
then PDPA will require 15K max n elementary operations. Therefore the best configuration 
for PDPA is still worse than the worst configuration for CART. 

Of course one should keep in mind that CART does not find the best segmentations 
but in most cases quickly finds quite good segmentations. 

In our simulation study we chose the number of segments and the size of the signal 
in order to analyse the general trend of the algorithms and to mimic signals expected in 
many applications: for instance, a segmentation in 15 segments is common in CGH-array, 
a length of 10 6 or 10 7 is common with Seq-dat, etc. 

Recently, ([6], 2012) compared the ability of several segmentation approaches to seg- 
ment real Comparative Genomic Hybridization data. They show in table 3 (see line 
cghseg.k) that the PDPA (we implemented a preliminary version of it in the cghseg pack- 
age) coupled with an appropriate model selection scheme outperforms existing method in 
terms of error. This demonstrate the advantage of the PDPA on real CGH data. 
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5 Discussion and Conclusion 



We described a generic C++ implementation of the PDPA described in [9] and that gets 
the best partitions in 1 to K max regions for a given loss function. We made an extensive 
comparison of the result of our code with the one obtained with the ODPA to illustrate 
its validity, and we implemented several loss functions (including the frequently used 
quadratic one) in our code to demonstrate its versatility. A user who might be interested 
in a particular loss function adapted to his data would only need to give its expression, 
and formulae to compute its minimum and roots (and is welcome to write to one of the 
authors for help). So far the implementation is limited to a set of one-parameter loss 
functions, but future work should include an extension to multi-parameter losses. 

The comparison of the result of the PDPA to the CART heuristic proves the advantage 
of retrieving the best partitions and a preliminary implementation of our code was recently 
shown to provide state of the art results on real CGH data (Hocking et al 2012). Moreover, 
this implementation can be used as a base for further analysis. For example, [8] use it to 
initialize their Hidden Markov Model to compute change-point location probabilities. 

Our code and an R package including our C++ code are available on the CRAN (or 
send an e-mail to one of the authors). 

APPENDIX 

The signals used for the comparisons between CART, ODPA and PDPA were generated 
as described in this paragraph. Once the size of the signal, the number of segments and 
the generating function (normal, Poisson or negative binomial laws) were chosen, the 
splitpoints were equally spaced along the signal and the parameters chosen as follows: 

• for the normal law, the variance parameter cr 2 was set to 1 along the sequence. 
Then for each segment k, the mean parameter \x was chosen equal to /i& = 2r with 
r = k — Qq and 1 < r < 6. 

• Similarly, the parameter of the Poisson was subject to the same pattern (A& = 2r 
with r = k — 6q and 1 < r < 6). 

• For the negative binomial law we used the parametrization AfB(fi, <p) where <fi = 
E(X) 2 /(Var(X)-E(X)) and fi = (Var(X) - E(X))/Var(X). For the simulations, 
4> was fixed equal to 0.6 for the whole sequence and we selected — l/(r + 1) with 
r = k — 6q and < r < 6. 
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